AITopics | gradient scale

Collaborating Authors

gradient scale

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

As one of the most promising methods in self-supervised learning, contrastive learning has achieved a series of breakthroughs across numerous fields. A predominant approach to implementing contrastive learning is applying InfoNCE loss: By capturing the similarities between pairs, InfoNCE loss enables learning the representation of data. Albeit its success, adopting InfoNCE loss requires tuning a temperature, which is a core hyperparameter for calibrating similarity scores. Despite its significance and sensitivity to performance being emphasized by several studies, searching for a valid temperature requires extensive trial-and-error-based experiments, which increases the difficulty of adopting InfoNCE loss. To address this difficulty, we propose a novel method to deploy InfoNCE loss without temperature. Specifically, we replace temperature scaling with the inverse hyperbolic tangent function, resulting in a modified InfoNCE loss. In addition to hyperparameter-free deployment, we observed that the proposed method even yielded a performance gain in contrastive learning. Our detailed theoretical analysis discovers that the current practice of temperature scaling in InfoNCE loss causes serious problems in gradient descent, whereas our method provides desirable gradient properties. The proposed method was validated on five benchmarks on contrastive learning, yielding satisfactory results without temperature tuning.

artificial intelligence, infonce loss, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2501.17683

Country: Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Privacy-Preserving Low-Rank Adaptation for Latent Diffusion Models

Luo, Zihao, Xu, Xilie, Liu, Feng, Koh, Yun Sing, Wang, Di, Zhang, Jingfeng

arXiv.org Artificial IntelligenceJun-8-2024

Low-rank adaptation (LoRA) is an efficient strategy for adapting latent diffusion models (LDMs) on a private dataset to generate specific images by minimizing the adaptation loss. However, the LoRA-adapted LDMs are vulnerable to membership inference (MI) attacks that can judge whether a particular data point belongs to the private dataset, thus leading to the privacy leakage. To defend against MI attacks, we first propose a straightforward solution: Membership-Privacy-preserving LoRA (MP-LoRA). MP-LoRA is formulated as a min-max optimization problem where a proxy attack model is trained by maximizing its MI gain while the LDM is adapted by minimizing the sum of the adaptation loss and the MI gain of the proxy attack model. However, we empirically find that MP-LoRA has the issue of unstable optimization, and theoretically analyze that the potential reason is the unconstrained local smoothness, which impedes the privacy-preserving adaptation. To mitigate this issue, we further propose a Stable Membership-Privacy-preserving LoRA (SMP-LoRA) that adapts the LDM by minimizing the ratio of the adaptation loss to the MI gain. Besides, we theoretically prove that the local smoothness of SMP-LoRA can be constrained by the gradient norm, leading to improved convergence. Our experimental results corroborate that SMP-LoRA can indeed defend against MI attacks and generate high-quality images. Our code is available at https://github.com/WilliamLUO0/StablePrivateLoRA.

dataset, mi attack, smp-lora, (12 more...)

arXiv.org Artificial Intelligence

2402.11989

Country:

Europe > Austria > Vienna (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Asia > Singapore (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

Add feedback

On the Ideal Number of Groups for Isometric Gradient Propagation

Kim, Bum Jun, Choi, Hyeyeon, Jang, Hyeonah, Kim, Sang Woo

arXiv.org Artificial IntelligenceFeb-6-2023

These behave similarly in that they apply mean and standard deviation (std) normalization and an affine transform. The difference lies in the units used for computing Recently, various normalization layers have been the mean and std. For example, for n features, layer proposed to stabilize the training of deep neural normalization computes a single mean and std for normalization, networks. Among them, group normalization is a whereas instance normalization computes n means generalization of layer normalization and instance and stds. Meanwhile, group normalization partitions n features normalization by allowing a degree of freedom in into G groups to compute G means and stds. From this the number of groups it uses. However, to determine perspective, layer normalization is a special case of group the optimal number of groups, trial-and-errorbased normalization for G = 1, and instance normalization is a hyperparameter tuning is required, and such special case of group normalization for G = n. Thus, group experiments are time-consuming. In this study, we normalization is more comprehensive and has a degree of discuss a reasonable method for setting the number freedom from the setting of the number of groups.

artificial intelligence, machine learning, neural network, (19 more...)

arXiv.org Artificial Intelligence

2302.03193

Country: Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback